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SYSTEMS FOR EXPRESSING TOXIC PROTEINS, VECTORS 
AND METHOD FOR PRODUCING TOXIC PROTEINS 

FIELD OF THE INVENTION 

The present invention relates to systems for expressing 
toxic proteins, to expression vectors comprising one of 
these systems, to prokaryotic cells transformed with 
these systems, and also to a method for synthesizing a 
toxic protein using these expression systems. 

It enables, for example, the overproduction in a 
prokaryotic cell, for example Escherichia coll 
(E. coll), of toxic hydrophobic proteins or peptides, 
for example the overproduction of transmembrane domains 
of viral envelope proteins. 

It finds many applications in particular in research 
concerning the mechanisms of viral infections, and in 
the search for and development of novel active 
principles for combating viral infections. 

In the description which follows, the references 
between square brackets [ ] refer to the attached 
reference list. 

BACKGROUND OF THE INVENTION 

Determination of the three-dimensional (3D) structure 
is a decisive step in the structural and functional 
understanding of proteins. 

Very great efforts and means have been, and are being, 
used to achieve this aim, and have been amplified with 



2 



the accumulation of data provided by the genome 
sequencing programmes [1]. 

The two main techniques for establishing these protein 
structures are X-ray diffraction, carried out using 
crystallized proteins, and nuclear magnetic resonance 
(NMR) carried out using proteins in solution. NMR, 
which is very suitable for studying proteins with a 
molecular mass of less than 20 kDa, requires however, 
like X-ray diffraction, the production of large amounts 
of material. It also means, in most cases, that 
material enriched in 15 N and/or 13 C must be prepared. 



In this context, the bacterium is a means of production 
that is widely used by the scientific community [2] . 
The overexpression of proteins in bacteria does not, 
however, occur without problems. In fact, it gives rise 
to three situations: 

The first case, which is ideal, is that where the 
protein is overproduced in a form that is correctly 
spatially folded during its synthesis in vivo. This is 
not a rare situation, but neither is it frequent. It 
concerns essentially soluble proteins that are small, 
i.e. approximately 20 to 50 kDa. 



The second case, the most common, is that where 
the protein is overproduced and aggregated in the form 
of inclusion bodies. This concerns polytopic and/or 
large proteins. In this case, the kinetics of folding 
of the protein are clearly slower than its rate of 
biosynthesis. This promotes exposure of the hydrophobic 
regions of the protein, that are normally buried in the 
core thereof, to the aqueous solvent and generates non- 
specific interactions that result in the formation of 
insoluble aggregates. According to the degree of 
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disorder of this folding, the inclusion bodies can be 
solubilized/unf olded under non-native conditions, with 
urea or guanidine. The solubilized protein is then 
subjected to various treatments, such as dialysis or 
dilution, so as to promote, successfully in certain 
cases, a native 3D folding. 



The third case is that where the expression 
engenders a varying degree of toxicity. This goes from 
an absence of expression product if the bacterium 
manages to adapt itself, to death of the bacterium if 
the product is too toxic. It is a case which occurs 
quite frequently and most commonly with membrane 
proteins or membrane protein domains, for instance 
those of the envelope proteins of the hepatitis C virus 
[5] or of the human immunodeficiency virus [6]. 



The problem of toxicity relates essentially to the 
expression of membrane proteins, i.e. proteins having a 
hydrophobic domain. Now, these proteins are of growing 
interest. Firstly, they are relatively numerous since 
the establishment of the various genomes confirms that 
they represent approximately 30% of the proteins 
potentially encoded by these genomes [7]. Secondly, 
they constitute 70% of the therapeutic targets and 
their alteration is the cause of many genetic diseases 
[8] . 



It is therefore essential to develop methods that 
facilitate or allow the expression of such proteins or 
of their membrane portion. 



Efforts have been made in this respect with, for 
example, the development of bacterial strains that 
either show better tolerance to the expression of 
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membrane proteins [9, 10], or have a stricter 
regulation of the mechanism in the expression, as in 
the case of the E. coll strain BL2 1 ( DE3 ) pLysS developed 
by Stratagene. However, these improvements do not make 
it possible to eliminate the toxicity phenomenon in all 
cases, in particular in the expression of hydrophobic 
peptides corresponding to membrane anchors. 

The treatment of hepatitis C currently represents one 
of the major high-stakes areas of medicine. Hepatitis C 
is caused by the hepatitis C virus (HCV) of the family 
of flaviviridae and which specifically infects hepatic 
cells [11] . This virus consists of a positive RNA of 
approximately 9500 bases which encodes a polyprotein of 
3033 residues [13], symbolized in the attached Figure 1 
by the rectangle 1A. This polyprotein is cleaved, after 
expression, by endogenous and exogenous proteases, so 
as to give rise to 10 different proteins. Two of them, 
called El and E2, are glycosylated and form the 
envelope of the virus. They each have membrane domains 
called TM, in particular TME1 for the El protein and 
TME2 for the E2 protein. The cleavage positions that 
generate them are indicated in Figure 1 by arrows with, 
mentioned below, a number which corresponds to the 
position in the polyprotein of the first amino acid of 
sequence resulting from the cleavage. The El and E2 
proteins are symbolized by a rectangle. The white 
portion of each rectangle corresponds to the ectodomain 
(ed) and the shaded domain to the transmembrane region 
(TM) . The primary sequence of the TMs is indicated at 
the bottom of the figure in one— letter-code , with 
numbers corresponding to the position of the amino 
acids in the polyprotein located at the ends of these 
domains. The stars indicate the hydrophobic amino 
acids. These membrane domains or membrane regions of 
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the virus have particular association properties that 
condition the structuring of the viral envelope [12]. 
In this respect, they constitute potential therapeutic 
targets. An understanding of the mechanism of 
association of the virus requires studies of the 3D 
structure of these domains, in particular by means of 
the abovement ioned techniques, which involves producing 
these peptides in abundant amounts, and also preferably 
via the biosynthetic pathway in order to allow 15 N 
and/or 13 C isotope labelling. 

The various El expression trials of the prior art, in 
particular in E. coll [14] [5] or in sf9 insect cells 
infected with baculoviruses [15], have not made it 
possible to overproduce this El protein, in particular 
due to the toxicity induced by its expression, 
including in the "resistant " E. coll BL21 (DE3 ) pLysS 
strains described above. There has been no E2 protein 
overexpression trial in bacteria. These toxicity 
problems are essentially due to the C-terminal region 
of the two proteins, that is rich in hydrophobic amino 
acids which form transmembrane domains that provide the 
anchoring to the membrane of the endoplasmic reticulum. 

There is therefore a real need for a system for 

expressing toxic proteins which does not have the 

drawbacks, and limitations, deficiencies and 

disadvantages of the techniques of the prior art. 

In addition, there is a real need for an expression 
vector comprising such a system for expressing toxic 
proteins, making it possible to carry out a method for 
producing toxic proteins which does not have the 
drawbacks, limitations, deficiencies and disadvantages 
of the techniques of the prior art. 
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SUMMARY OF THE INVENTION 

The aim of the present invention is precisely to 
provide a system for expressing a toxic protein, which 
satisfies, inter alia, the needs indicated above. 

This aim, and others, are achieved, in accordance with 
the invention, by means of an expression system 
characterized in that it comprises successively, in the 
5 '-3' direction, a nucleotide sequence encoding the 
dipeptide Asp-Pro, referred to below as dp sequence, 
and a nucleotide sequence (pt) encoding a toxic protein 
(Pt). This system will be identified below by: dp-pt . 

DETAILED DESCRIPTION OF THE INVENTION 

According to a particularly preferred embodiment of the 
present invention, the expression system also 
comprises, upstream of the dp sequence, a nucleotide 
sequence (ps) encoding a soluble protein (Ps) . This 
soluble protein may be, for example, glutathione 
S-transf erase (GST) or thioredoxin (TrX) or another 
equivalent soluble protein. This expression system 
according to the invention will be identified below by: 
ps-dp-pt . 

The dp-pt expression system of the present invention, 
which comprises a sequence encoding Asp-Pro (DP in one- 
letter code) placed upstream of the nucleotide sequence 
of the toxic protein, makes it possible, entirely 
unexpectedly, to suppress the toxic effect of the 
protein for the host cell. In addition, the inventors 
have noted that, entirely surprisingly, the suppression 
of toxicity of the protein in the host is even more 
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effective with the ps-dp-pt expression system when the 
toxic peptide is produced as a C-terminal fusion with a 
soluble protein, for example glutathione S-transf erase 
or thioredoxin, with the sequence Asp-Pro inserted 
between the soluble protein and the toxic peptide. 

The dp-pt or ps-dp-pt expression system of the present 
invention makes it possible to overproduce toxic 
proteins in host cells, in particular hydrophobic 
proteins, especially peptides which correspond to, or 
which comprise, hydrophobic domains of membrane- 
anchored proteins which may involve, for example, a 
membrane protein or a domain of a membrane protein. It 
may involve, for example, a protein of a virus, for 
example of a hepatitis C virus, of an AIDS virus, or of 
any other virus that is pathogenic for humans and, in 
general, for mammals. 

For example, the dp-pt or ps-dp-pt system of the 
invention makes it possible to overproduce, in a host 
such as E. coll, the transmembrane domains of the El 
and E2 proteins of the hepatitis C virus, called TME1 
and TME2, corresponding respectively to the sequences: 

TME 1 : 3 4 7 -MI AGAHWGVLAGI AYF SMVGNWAKVLVVLLLFAGVDA-3 8 3 
sequence ID No. SEQ ID NO: 1 

TME 2 : 7 1 7-MEYVVLLFLLLADARVCSCLWMMLL I SQAEA- 7 4 6 
Sequence ID No. SEQ ID NO: 2 

whereas this was not possible with the techniques of 
the prior art . 

The nucleotide sequences that can be used for 
constituting the dp-pt system of the invention encoding 
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the TME1 (dp-pt(TMEi) ) or TME2 (dp-pt (tme2) ) proteins can 
be any of the possible sequences encoding respectively 
the DP-TME1 and DP-TME2 fusion proteins. The sequences 
encoding the TME1 and TME2 proteins may advantageously 

be, for example, sequence No . SEQ ID NO: 3 and 

sequence ID No . SEQ ID NO : 4, respectively, of the 

attached sequence listing. To obtain the dp-pt system, 
the dp sequence encoding the dipeptide Asp-Pro (DP) is 
added to these sequences. 

The nucleotide sequences that can be used for 
constituting the ps-dp-pt system of the invention 
encoding the TME1 (ps-dp-pt (tmed ) or TME2 (ps-dp-pt ( T me2) ) 
proteins may be any of the possible sequences encoding 
the Ps-DP-TMEl and Ps-DP-TME2 fusion proteins, 
respectively. They may advantageously be, for example, 
the sequences ID No. 34, ID No. 35 and ID No. 36 of the 
attached sequence listing for TME1, making it possible 
to obtain a Ps-DP-TMEl chimeric protein. They may 
advantageously be, for example, the sequences ID 
No. 37, ID No. 38 and ID No. 39 of the attached 
sequence listing for TME2, making it possible to obtain 
a Ps— DP— TME2 chimeric protein. 

In fact, the abovement ioned nucleotide sequences have 
optimized codons for the expression of TME1 and TME2 in 
a bacterium, for example in E. coll. 

A large number of HCV RNA sequences producing an 
infectious phenotype exist: these sequences can also be 
used in the present invention. 

The sequence encoding the dipeptide Asp-Pro may be, for 
example: gacccg, or any other sequence encoding this 
dipeptide . 
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The sequence encoding GST may be, for example, that 
present in the pGEXKT plasmids, the sequence of which 
corresponds to sequence — — No . SEQ ID NO : 29 of the 
attached sequence listing, or any equivalent sequence, 
i.e. encoding this soluble protein. The sequence 
encoding TrX may be, for example, that present in the 
pET32a+ expression plasmid, the sequence of which 

corresponds to sequence — No . SEQ ID NO : 30 of the 

attached sequence listing, or any equivalent sequence, 
i.e. encoding this soluble protein. 

For the production of the toxic protein, the dp-pt or 
ps-dp-pt expression system of the invention is placed 
inside a host cell, for example by cloning in an 
appropriate plasmid, by means of the usual techniques 
for transforming a host in genetic recombination 
techniques . 

The plasmid into which the expression system of the 
present invention may be cloned so as to form this 
vector will be chosen in particular according to the 
host cell. It may be, for example, the pT7-7 plasmid 
( sequence — ID No . SEQ ID NO : 33 of the attached sequence 
listing), a plasmid of the pGEX series (for example of 
sequence — ID No . SEQ ID NO: 31 of the attached sequence 
listing) , sold for example by the company Pharmacia, or 
a plasmid of the pET32 series (for example of sequence 
ID No: 32 of the attached sequence listing), sold for 
example by the company Novagen. 

The plasmids of the pGEX series and of the pET32 series 
will advantageously be used for implementing the 
present invention. In fact, they already comprise a ps 
sequence encoding a soluble protein (Ps) , respectively 
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glutathione S-transf erase and thioredoxin. Thus, 
advantageously, the dp-pt system will be cloned into 
these plasmids downstream of this ps sequence encoding 
the soluble protein. 



The present invention therefore also relates to an 
expression vector comprising a dp-pt or ps-dp-pt 
expression system according to the invention; in 
particular, a vector comprising a dp-pt expression 
system according to the invention and the 
oligonucleotide sequence of the pT7-7 plasmid, or a 
vector comprising a ps-dp-pt expression system 
according to the invention and the oligonucleotide 
sequence of a pGEX plasmid or of a pET32 plasmid. 



For example, the expression vectors of the present 
invention that are suitable for a bacterial host such 
as E. coll and that allow overexpression of the 
abovement ioned TME1 membrane protein may advantageously 
have an oligonucleotide sequence chosen from the 
sequences ID No. 40 (with pGEXKT ) , ID No. 42 (with 
pET32a+) and ID No. 44 (with PT7-7) of the attached 
sequence listing. 

For example, the expression vectors of the present 
invention that are suitable for a bacterial host such 
as E. coll and that allow overexpression of the 
abovement ioned TME2 membrane protein may advantageously 
have an oligonucleotide sequence chosen from the 
sequences ID No. 41 (with pGEXKT ) , ID No. 43 (with 
pET32a+) and ID No. 45 (with pT7-7) of the attached 
sequence listing. 



In fact, the abovement ioned expression vectors have 
codons that are optimized for the expression of the 
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chimeric proteins of the present invention, including 
TME1 and TME2 , in a bacterium, for example in E. coll. 

The present invention also relates to a prokaryotic 
cell transformed with an expression vector according to 
the invention. This prokaryotic cell transformed with 
the expression vector of the present invention should 
preferably allow overexpression of the toxic protein 
for which the vector codes. Thus, any host cell capable 
of expressing the expression vector of the present 
invention can be used, for example E. coll, 
advantageously the E. coll strain BL21 (DE3 ) pLysS . 

The present invention also relates to a method for 
producing a toxic protein by genetic recombination, 
comprising the following steps: 

transforming a host cell with an expression vector 
according to the invention, 

culturing the transformed host cell under culture 
conditions such that it produces a fusion protein 
comprising the dipeptide Asp-Pro followed by the 
peptide sequence of the toxic protein from said 
expression vector, and 

- isolating said fusion protein, and 

cleaving said fusion protein so as to recover the 
toxic protein. 

The steps for transforming, culturing and isolating the 
chimeric protein produced can be carried out by means 
of the usual techniques of genetic recombination, for 
example by means of techniques such as those that are 
described in document [25]. 



The step consisting in isolating the fusion protein can 
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be carried out by means of the usual techniques known 
to those skilled in the art for isolating a protein 
from a cell extract. 

The fusion protein produced by means of the method of 
the invention has a "soluble pr otein-Asp-Pro-t oxic 
protein" sequence. In the present description, the 
dipeptide Asp-Pro is also called DP according to the 
one-letter amino acid code. 

For example, when the toxic protein is TME1, the fusion 
protein may have the sequence — SB — No . SEQ ID NO: 46 of 
the attached sequence listing, which corresponds to the 
GST-DP-TME 1 fusion protein; the sequence — £B — Ne-r SEQ ID 
NO : 48 of the attached sequence listing, which 
corresponds to the TrX-DP-TME 1 fusion protein; or the 
sequence — SB — No . SEQ ID NO: 50 of the attached sequence 
listing, which corresponds to the M-DP-TME 1 fusion 
protein of the attached sequence listing. 

For example, when the toxic protein is TME2, the fusion 
protein may have the sequence — SB — No . SEQ ID NO: 47 of 
the attached sequence listing, which corresponds to the 
GST-DP-TME 2 fusion protein; the sequence — SB — N^-r SEQ ID 
NO : 49 of the attached sequence listing, which 
corresponds to the TrX-DP-TME2 fusion protein; or the 
sequence — SB — No . SEQ ID NO : 51 of the attached sequence 
listing, which corresponds to the M-DP-TME2 fusion 
protein of the attached sequence listing. 

The step consisting of cleavage of this fusion protein 
can advantageously be carried out by means of formic 
acid, which cleaves the fusion protein at the dipeptide 
Asp-Pro. It may be carried out, moreover, by means of 
any appropriate technique known to those skilled in the 
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art for recovering a protein from a sample using a 
fusion protein. 

The inventors are the first to have found a system that 
is really effective for producing and even 
overproducing, in particular in the Escherichia coll 
(E . coll) bacterium, hydrophobic peptides corresponding 
to the membrane domains of the El and E2 proteins of 
the hepatitis C virus envelope, the expression of which 
is lethal for the microorganism. 

The field of application of the present invention 
concerns mainly the production of hydrophobic peptides 
on a large scale, in particular for fundamental and 
industrial research. In addition, the production of the 
chimeric protein consisting of the soluble protein, of 
the dipeptide Asp-Pro and of the hyrophobic peptide can 
be used for a functional purpose, in particular for 
obtaining information on the degree of oligomerization 
of the membrane domain or else on its 
heteropolymerization capacity . 

The fusion proteins, or chimeric proteins, are produced 
via their coding DNA present, for example, in 
commercial plasmids and following which is introduced, 
in phase, the DNA encoding the Asp-Pro sequence 
followed by that encoding the toxic peptide. This 
application can be commercialized in the form of 
bacterial expression plasmids which will include the 
sequence of the Asp-Pro site, downstream of that of the 
soluble proteins already present. The corresponding 
plasmid will be described, for example, as a tool that 
facilitates the production, via the biological pathway, 
of toxic membrane peptides or proteins. 
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Thus, the present invention is applicable to any system 
for overexpressing recombinant proteins, with or 
without fusion to a soluble protein such as, for 
example, GST or thioredoxin, including a non-natural 
Asp-Pro sequence inserted upstream of a sequence 
encoding a toxic domain of the protein, for example a 
membrane domain of a protein. 

Other characteristics and advantages of the present 
invention will become further apparent to those skilled 
in the art on reading the following examples given by 
way of non-limiting illustration, with reference to the 
sequence listing and to the figures that are attached. 

Brief description of the attached sequence listing 

Scqucncco SB — Nop . SEQ ID NOS : 1 and 2: peptide 

sequences of TME1 and of TME2, respectively. 
Scqucncco — 5© — Noo . SEQ ID NOS : 3 and 4: sequences 
encoding the TME1 peptide and the TME2 peptide, 
respectively . 

Scqucncco Noo . SEQ ID NOS: 5 and 6 : 

respectively, oligonucleotide (+) for insertion 
into pT7-7 (0L13(+)) and oligonucleotide (-) for 
insertion into pT7-7 (0L14(-)). 

Scqucncco Noo. SEQ ID NOS: 7 and 8 : 

respectively, coding sense DNA of TME1 + cla I 
site in the 3' position and anticoding sense DNA 
of TME1 + cla I site in the 5' position (sequence 
complementary to the sequence — ID No . SEQ ID NO: 7) . 

Scqucncco SB Noo . SEQ ID NOS: 9 and 10: 

respectively, coding sense oligonucleotide 

(0L11(+)) and anticoding sense oligonucleotide 
(OL12(-)) for the synthesis of TME1. 

Sequence — — Ne- rSEQ ID NO: 11: oligonucleotide ( + ) 
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for insertion into pGEXKT without dp site 
(0L15 ( + ) ) . 

Sequence — ID No . SEQ ID NO: 12: oligonucleotide ( + ) 
for insertion into pGEXKT with dp site (0L17(+)). 
Sequence — SB — No . SEQ ID NO: 13: oligonucleotide (-) 
for insertion into pGEXKT (0L16 (-) ) . 

Sequence — SB — Ne^ -SEQ ID NO: 14: oligonucleotide ( + ) 
for insertion into pET32a (OL18(+)) (hybridizes to 
the segment 915-932 of pGEXKT) . 

Sequences SB Noo. SEQ ID NOS : 15 and 16: 

respectively, oligonucleotides (+) (0L19(+)) and 
(-) (OL20(-)) for insertion into pT7-7 of the DNA 
encoding MDP-TME1. 

Sequences SB Noo. SEQ ID NOS: 17 and 18: 

respectively, oligonucleotide (+) for insertion 
into pT7-7 (OL23(+)) and oligonucleotide (-) for 
insertion into pT7-7 (OL24(-)). 

Sequences SB Noo. SEQ ID NOS: 19 and 20: 

respectively, coding sense DNA for TME2 + Nde I 
site in the 5' position and Hind III site in the 
3' position; and anticoding sense DNA of TME2 + 
Nde I site in the 3' position and Hind III site in 
the 5' position (sequence complementary to ID No. 
17) . 

Sequences SB Noo. SEQ ID NOS: 21 and 22: 

respectively, coding sense oligonucleotide 

(OL21(+)) and anticoding sense oligonucleotide 
(OL22(-)) for the synthesis of TME2 . 

Sequence — SB — Me- rSEQ ID NO: 23: oligonucleotide ( + ) 
for insertion into pGEXKT without dp site 
(OL25 (+) ) . 

Sequences SB Noo. SEQ ID NOS: 2 4 and 25: 

respectively, oligonucleotides (+) (OL27(+)) and 
(-) (OL26(-)) for insertion into pGEXKT with dp 
site . 
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Sequences £B Noo. SEQ ID NOS : 2 6 and 27: 

respectively, oligonucleotides (+) (OL28(+)) and 
(-) (OL29(-)) for insertion into pT7-7 of the DNA 
encoding MDP-TME2 . 

Sequence — £B — No . SEQ ID NO: 28: end of the sequence 
of the GST soluble protein followed by the 
thrombin site encoded in the pGEXKT plasmid. 
Sequence ID No . SEQ ID NO: 29: DNA encoding the GST 
protein in the pGEXKT plasmid. 

Sequence -BB Ne-r SEQ ID NO: 30: DNA encoding 

thioredoxin (TrX) in the pET32a+ plasmid. 
Sequences SB Noo. SEQ ID NOS: 31, 3 2 and 33: 

respectively, pGEXKT, pET32a+ and pT7-7 expression 
plasmids . 

Sequences fB Noo. SEQ ID NOS: 34, 3 5 and 36: 

respectively, expression systems according to the 
invention encoding the GST-DP-TME 1 , TrX-DP-TME 1 
and M-DP-TME 1 fusion proteins. 

Sequences fB Noo. SEQ ID NOS: 3 7, 3 8 and 39: 

respectively, expression systems according to the 
invention encoding the GST-DP-TME 2 , TrX-DP-TME2 
and M-DP-TME2 fusion proteins. 

Sequences £B Noo. SEQ ID NOS: 4 0 and 41: 

respectively, pGEXKT — dp— ptiME i and p GE XK T — dp —p t tme 2 
expression vectors according to the invention 
encoding the GST-DP-TME 1 and GST-DP-TME 2 fusion 
proteins . 

Sequences ID No. 42 and 43: respectively, 
pET32a-dp-pt T MEi and pET32a-dp-pt T ME2 expression 
vectors according to the invention encoding the 
TrX- DP -TME 1 and TrX-DP-TME2 fusion proteins (code 
via the complementary strand) . 

Sequences £B Noo. SEQ ID NOS: 4 4 and 45: 

respectively, pT7-7-dp-pt T MEi and pT7-7-dp-pt T ME2 
expression vectors according to the invention 
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encoding the MDP-TME 1 and M-DP-TME2 fusion 
proteins . 

Sequences Nod. SEQ ID NOS : 4 6 and 47: 

respectively, GST-DP-TME 1 and GST-DP-TME2 fusion 
proteins according to the invention obtained from 
the pGEXKT-dp-ptiMEi and pGEXKT-dp-pt T ME2 plasmids . 

Sequences SB Nos. SEQ ID NOS: 4 8 and 49: 

respectively, TrX-DP-TME 1 and TrX-DP-TME2 fusion 
proteins according to the invention obtained from 
the pET32a-dp-pt TM Ei and pET3 2a-dp-pt T ME2 plasmids. 

Sequences Nos. SEQ ID NOS: 5 0 and 51: 

respectively, M-DP-TME 1 and M-DP-TME2 fusion 
proteins according to the invention obtained from 
the pT7-7-dp-pt T MEi and pT 7- 7- dp-pt T me2 plasmids. 

Sequences SB Nos. SEQ ID NOS: 52 and 53: 

respectively, GST and TrX proteins encoded by the 
pGEXKT and pET32a+ vector. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1: diagrammatic representation of a portion 
of the HCV polyprotein and peptide sequence of the 
C-terminal membrane domains of the El and E2 envelope 
proteins. The peptide sequences represented correspond 
to the infectious type #D00831 and #M67463 for TME1 
(SEQ ID NO: 1) and TME2 (amino acids 2-31 of SEQ ID NO: 
2 ) r respectively, obtained from the public sequence 
library of the European Molecular Biology Laboratory 
(EMBL) . 

Figure 2: creation of the DNA encoding the 
C-terminal membrane domain of the HCV El envelope 
protein and additional sequences in the 5' and 3' 
positions for cloning in various plasmids. The 
sequences represented in this figure are reported in 
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the attached sequence listing. In particular, Figure 2A 



shows SEQ 
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: 1 and 
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C-terminal membrane domain of the HCV E2 envelope 
protein and additional sequences in the 5 ' and 3 ' 
positions for cloning in various plasmids. The 
sequences represented in this figure are reported in 
the attached sequence listing. In particular, Figure 
3A shows SEQ ID NO: 2; Figure 3B shows, from top to 
bottom, SEQ ID NOS: 2 and 4; Figure 3C shows, from top 
to bottom, SEQ ID NOS: 17, 21, 22, and 18; and Figure 
3D shows, from top to bottom, SEQ ID NOS: 21, 22, 17, 
18, 23, 24, 25, 14, 25, 26 and 27. 



Figure 4, panels A to F: toxicity of the membrane 
domains expressed in the bacterium and suppression of 
this toxicity by insertion of a dp site. Panels A, C 
and E are graphic representations of optical density 
(OD) measurements at 600 nm as a function of time (t) 
in hours of production of various proteins in a 
bacterium using or not using the expression system of 
the present invention. Panels B, D and F are 
representations of the gels of migration of the 
proteins of panels A, C and E, respectively. 



Figures 5A and B: overexpression of the 
thioredoxin-Asp-Pro-Pt chimeric proteins (Pt = membrane 
domains of the proteins) in the bacterium. Figure 5A is 
a graphic representation of the optical density (OD) 
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measurements at 600 nm as a function of the time in 
hours of production of various proteins in a bacterium 
using or not using the expression system of the present 
invention: Figure 5B is a representation of a gel of 
migration of the proteins of Figure 5A. 



Figure 6: expression and purification of the 
GST-TME2 fusion (or chimeric) protein, and comparison 
with GST alone. This figure represents, at the top, the 
peptide sequences of GST (SEQ ID NO: 52) and GST-TME2, 
and, at the bottom, the gels obtained by 
electrophoresis, showing that, unlike GST alone, 
GST-TME2 is insoluble. The latter is produced in the 
form of inclusion bodies that cannot fold correctly. 

Figures 7A and 7B: graphic representations of 
comparative experimental results showing the effect of 
the DP dipeptide (dp-pt oligonucleotide sequence in 
accordance with the present invention) and of the DP 
dipeptide and the soluble protein (ps-dp-pt 
oligonucleotide sequence in accordance with the present 
invention) on the synthesis of the TME1 and TME2 toxic 
proteins in accordance with the present invention. 



EXAMPLES 



the 



In these examples, 
ordered from 
( http i / / www . curobio . f r / ) - ; 
with the QIAprep kit 
( http : / / www . qiagcn . com/ ) - ; 



oligonucleotides used were 
Laboratoires EUROBIO 
the plasmids were prepared 
(brand name) from Qiagen 
the DNA sequences were 
sequenced with the ABI PRISM (registered trade mark) 
BigDye (brand name) Terminator cycle kit from Applied 

Biosystems ( http : / / home . applicdbio systems . com/ ) ; the 

E. coll strains BL21(DE3) and BL21 (DE3 ) pLysS were 
obtained from Stratagene — ( http : / /www, strut agene . com/ ) ; 
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the C41 and C43 (BL21(DE3)) strains were provided by 
Dr. Bruno Miroux ( CNRS-CEREMOD, Centre for Research on 
molecular endocrinology and development; the DNA 
restriction and modification enzymes were obtained from 

New England Biolabs ( http : / / www . neb . com/ ncb/ ^- ; the 

protein electrophoreses were carried out with a 
miniprotean 3 (brand name) from Bio-Rad Laboratories 
( http : / / www . bio rad . com -)- ; the plasmid pCR (registered 
trade mark) T7 topo TA was obtained from Invitrogen 
( http : / / www . invitrogen . com/ )- ; the pET32a+ plasmid was 

obtained from Novagen ( http : / / www . novagen . com -)- ; the 

pT7-7 and pGPl-2 plasmids and the K38 strain [22] were 
requested from Prof. Tabor (Department of Biological 
Chemistry, Harvard Medical School); the pGEX-KT plasmid 
was requested from Prof. Dixon (Department of 
Biological Chemistry, University of Michigan Medical 
School); the other products were obtained from Sigma 
( http : / / oigma . aldrich . comf . 

In the following examples, the production of the TME1 
and TME2 peptides was firstly carried out without the 
expression system of the present invention, and then as 
a fusion with a soluble protein and, finally, as a 
fusion with GST with insertion of the Asp-Pro ("DP" in 
one-letter coding) site between the soluble protein and 
TME1 or TME2. 

The abbreviation "SEQ — ¥B — No . " — ±-s — used — for — "sequence — -E-B 
No . " SEQ ID NO : " and — refers to the attached sequence 
listing . 

Example 1: Synthesis of the expression system 

1.1) CONSTRUCTION OF THE pT7-7-pt TME i and pT7-7-pt TME2 

EXPRESSION VECTORS 
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The DNA encoding the two domains was synthesized 
de novo using the appropriate oligonucleotides. The 
codons were chosen according to their greatest 
frequency of use in the bacterium, as was quantified by 
Sharp et al . [17]. The constructs are described in the 
attached Figure 2 for TME1 and in the attached Figure 3 
for TME 2. 

Each synthetic DNA was generated using a set of two 
long and overlapping oligonucleotides, 0L11 ( SEQ — 
Ne-r SEQ ID NO: 9) and 0L12 ( SEQ ID No . SEQ ID NO: 10) for 
TME 1 , and OL21 ( SEQ ID No . SEQ ID NO: 19) and OL22 (€r£Q 
— Me- rSEQ ID NO: 20) for TME 2 , which were amplified 
after hybridization with two external oligonucleotides 
chosen according to the cloning in a given plasmid. 
Thus, the clonings in pT7-7 were carried out using the 
set of external oligonucleotides OL13 ( SEQ ID No . SEQ ID 
NO: 5) and OL14 ( SEQ ID No . SEQ ID NO: 6) for TME 1 , and 
OL23 ( SEQ ID NO = SEQ ID NO: 15) and OL24 ( SEQ ID NO . SEQ 
ID NO: 16) for TME 2 . 

Each synthetic DNA was generated using a set of four 
oligonucleotides: two long and overlapping and two 
short and external. The DNAs were amplified by the 
polymerase chain reaction method, referred to as "PCR" 
[18], and then cloned into a bacterial plasmid pCR 
(brand name) T7 topo TA. The synthesized DNAs were 
sequenced and then subcloned into the pT7-7 bacterial 
expression vector [19] using the Nde I restriction site 
in the 5 ' position and the Cla I or Hind III 
restriction site in the 3 ' position. 



In Figure 2 : 
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A: TME1 peptide sequence of subtype #D00831. The 
numbering corresponds to the position of the sequence 
in the polyprotein as described in Figure 1 . 
B: DNA sequence encoding the membrane domain with 
optimized codons for expression in the bacterium. 
C and D: Strategy for DNA amplification without 
matrix. The coding sense and the anticoding sense of 
the oligonucleotides are indicated, respectively, by 
the signs (+) and (-) . The long oligonucleotides 
overlap by about twenty bases so as to create the 
primer and then the matrix. The short oligonucleotides 
make it possible to amplify the matrix by PCR, 
integrating the desired restriction sites according to 
the plasmids used. The insertion into pT7-7 was carried 
out with the pair of oligonucleotides 0L13 ( SEQ — £© 
Ne^- SEQ ID NO: 5) and 0L14 ( SEQ ID No . SEQ ID NO: 6), via 
a subcloning in pCRT7 topo, integrating the Nde I and 
Hind III sites. The insertion into pGEXKT was carried 
out according to the same method, with the pair of 
oligonucleotides 0L15 (S&Q — £© — N^ -SEQ ID NO: 11) and 
0L16 ( SEQ ID No. SEQ ID NO: 13), integrating the BamR I 
and EcoR I sites. The insertion of the dp site (gacccg) 
and the cloning in pGEXKT were carried out with the 
pair of oligonucleotides 0L17(£B£ — ID No . SEQ ID NO: 12) 
and 0L16 (€r&Q — — Ne^ -SEQ ID NO: 13) . The construct in 
pGEXKT was transferred into pET32a, which encodes 
thioredoxin, with the pair of oligonucleotides 0L18 
(€r&Q — ID No. SEQ ID NO: 14) and 0L16 (SSQ — ID No . SEQ ID 
NO: 13) . The oligonucleotide 0L18 (S&Q — — Ne-^ SEQ ID 
NO : 14) hybridizes in the terminal region of the DNA 
encoding GST in pGEXKT. The amplified sequence 
integrates the end of GST (SDLSGGGGG) followed by the 
thrombin site (LVPRGS) (SfiQ — ID No . SEQ ID NO: 28), by 
the DP site and by the membrane passage. After cloning, 
the DNA inserted into pET32a makes it possible to 
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express the thioredoxin-SDLSGGGGGLVPRGS-DP-TMEl chimera 
( SEQ ID NO. SEQ ID NO: 48) . 

In Figure 3 : 

The legend is identical to Figure 2, but the peptide 
sequence is that of subtype #M67463. The insertion into 
pT7-7 was carried out with the pair of oligonucleotides 

OL23 and OL24 (£BQ — NQ- ^SEQ ID NO: 17 and 

No . SEQ ID NO: 18, respectively), via a subcloning in 
pCRT7 topo, integrating the Nde I and Hind III sites. 

The insertion into pGEXKT was carried out according to 
the same method, with the pair of oligonucleotides OL25 
and OL26 ( SEQ ID No . SEQ ID NO: 23 and SEQ ID No . SEQ ID 
NO : 25, respectively) , integrating the BamH I and 
EcoR I sites. Insertion of the dp site (gacccg) and the 
cloning in pGEXKT were carried out with the pair of 
oligonucleotides OL27 and OL26 ( SEQ ID No . SEQ ID NO: 24 
and €r£Q — ID No . SEQ ID NO: 25, respectively) . Insertion 
into pET32a was carried out as described in Figure 2, 
using the pair of oligonucleotides 0L18 and OL26 ( SEQ 

-¥B Me^- SEQ ID NO: 14 and €r&Q SB Ne-r SEQ ID NO: 25, 

respectively) . 

1.2) CONSTRUCTION OF THE p GE XK T —p t tme i r pGEXKT-pt TM E2 , 
pGEXKT — dp —p t tme i AND pGEXKT — dp —p t tme 2 EXPRESSION VECTORS 

The pGEXKT— pt tme i and pGEXKT-pt tme 2 expression vectors 
were constructed by PCR as described in the attached 
Figures 2 and 3. The matrix DNA used to amplify the 
DNAs encoding TME 1 or TME 2 is that cloned into the 
pT7-7 plasmids. The cloning of TME 1 into the pGEXKT 
plasmid [20, 21] was carried out using the sets of 
oligonucleotides OL15 (€r£Q — £© — Ne- ^SEQ ID NO: 11) and 
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0L16 (£BQ — ID No. SEQ ID NO: 13) allowing insertion of 
the BamH I restriction site in the 5' position and the 
EcoR I restriction site in the 3' position. The cloning 
of TME2 into the same vector was carried out using the 
sets of oligonucleotides OL25 ( SEQ ID No. SEQ ID NO: 21) 
and OL26 ( SEQ ID No. SEQ ID NO: 23) . 

As indicated in Figure 2, the insertion of the dp site 
at the N-terminal position of TME1 was carried out by 
replacing the 5' oligonucleotide 0L15 ( SEQ ID No. SEQ ID 
NO : 11) with the oligonucleotide 0L17 ( SEQ ID No. SEQ ID 
NO : 12) . The insertion of the dp site at the N-terminal 
position of TME2 was carried out by replacing the 5 r 
oligonucleotide OL25 ( SEQ ID No. SEQ ID NO: 21) with the 

oligonucleotide OL27 (€r&Q SB — N-e- ^SEQ ID NO: 22), as 

shown in Figure 3 . 

1.3) CONSTRUCTION OF THE pET3 2 a-dp-TME 1 AND 

pET32a-dp-TME2 EXPRESSION VECTORS 

The pET32a-dp-TMEl and pET32a-dp-TME2 expression 
vectors were constructed by PCR as described in the 
attached Figures 2 and 3, using the set of 
oligonucleotides indicated. The upstream 

oligonucleotide integrates an EcoR V site and 
hybridizes with the terminal region of the gene 
encoding GST. It makes it possible to integrate the 
5-glycine tail and the thrombin-cleavage site present 
in the plasmid. The downstream oligonucleotide is the 
same as that used for the cloning in pGEXKT . 

The insertion into the pET32a plasmid is carried out 
via the MsC I/EcoR V sites in the 5 ' position and the 
EcoR I site in the 3' position. It makes it possible to 
insert, in phase at the end of the thioredoxin 
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sequence, the 5-glycine tail, the thrombin-cleavage 
site, the DP site and the membrane passage. The pET32a 
plasmid of origin, which serves as a control, encodes 
thioredoxin followed by a sequence integrating various 
elements that have not been deleted and that 
contribute, to a large degree, to the mass of the 
chimeric protein produced. 

The matrix DNA used to amplify the DNAs encoding TME1 
or TME2 is that cloned into the pGEXKT-dp-pt TM Ei or 
pGEXKT-dp-pt tme2 plasmids . For TME1, the cloning into 
pET32a+ was carried out using the sets of 
oligonucleotides 0L18 (€r&Q — £© — Ne^ -SEQ ID NO: 14) and 
0L16 — — Me-r SEQ ID NO: 13) . The cloning of TME2 

into the same vector was carried out using the sets of 
oligonucletides 0L18 ( SEQ ID No . SEQ ID NO: 14) and OL26 
( SEQ ID No. SEQ ID NO: 23), as indicated in Figure 3. 

Example 2 : Expression of sequences encoding the TME1 
and TME2 proteins alone 

The expression of the sequences encoding the TME1 and 
TME2 domains alone was tested by thermal or chemical 
induction and using various bacterial strains as 
described below. 

2.1) THERMAL INDUCTION SYSTEM 

The system developed by Tabor [22] makes it possible to 
express a protein by thermal induction using two 
vectors in the same bacterium,, pT7-7 and pGPl-2 . 

The pT7-7 plasmid contains the DNA to be expressed, 
placed under the control of a <f>l 0 promoter recognized 
by the T7 phage RNA polymerase. The pGPl-2 plasmid 
contains the gene encoding the T7 phage polymerase, 
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placed under the control of a A,p L promoter. This 
promoter is repressed by a thermosensit ive repressor, 
CI857, that is itself also present in pGPl-2. At 30°C, 
CI857 is normally expressed and represses the Ap L 
promoter, which blocks the expression of the polymerase 
and therefore also that of the protein of interest. 

The induction is triggered by switching the culture 
from 37 to 42°C for 15-30 min, and then the expression 
continues at 37°C. This system is therefore 
particularly suitable when it is necessary to strictly 
control the expression of a given protein, in 
particular if said protein is toxic for the bacterium. 

2.2 CHEMICAL INDUCTION SYSTEM 

The same pT7-7 plasmid containing the DNA to be 
expressed is this time introduced into E. coll bacteria 
of the type BL21(DE3) (B f~ dcm omtP hsdSf r~m~ ) gal X 
( DE3 ) ) and BL21 (DE3 ) pLysS (B F" dcm ompT hsdS (r~m~) gal 
X (DE3) [pLysS Cam r ] ) . These bacteria have been 
modified so as to contain in the genome a copy of the 
gene encoding the T7 phage RNA polymerase, placed under 
the control of a lacUV5 promoter that can be induced 
with isopropyl-l-thio-|3-D-galactoside (IPTG). In this 
case, the bacteria are cultured at their optimum 
temperature of 37°C or less if necessary. The 
expression is induced by adding IPTG to the culture. 
The BL2 1 (DE3 ) pLysS strain is particularly suitable for 
proteins whose base line expression is toxic for the 
host bacterium. In fact, the presence of the pLysS 
plasmid allows continuous expression, at a low level, 
of T7 phage lysozyme. This inhibits the T7 phage 
polymerase, the weak expression of which in the absence 
of induction could allow the base line expression of 
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toxic protein. 

The inventors also tested the expression of the 

membrane domains alone in strains called C41 and C43 
[10], which were selected so as to withstand the 

expression of toxic membrane proteins. These strains 

are derived from the BL21(DE3) strain and are used in 
the same way as the latter. 

2.3) EXPRESSION TESTS 

According to the system tested, the corresponding 
plasmids were introduced by transformation into the 
various strains of E. coll: K38 (HfrC X) for the Tabor 
thermal induction system or the various BL21 strains 
for the chemical induction. Table 1 below summarizes 
the tests performed. 



Table 1 



Induction 


Strain 


Plasmid 


Thermal 


K3 8 


pT7-7+pGPl-2 


Chemical 


BL21 (DE3 ) 


pT7-7 


Chemical 


BL21 (DE3)pLysS 


pT7-7 


Chemical 


C41 (BL21 (DE3) ) 


pT7-7 


Chemical 


C43 (BL21 (DE3) ) 


pT7-7 



In each case, about ten transf ormant s were placed in 
culture in order to test the expression. Briefly, the 
bacteria were cultured in 5 ml of LB (10 g tryptone, 
5 g yeast extract, 5 g NaCl, qs 1 litre H 2 0) , 
supplemented with 50 pg/ml of ampicillin (necessary in 
order to maintain pT7-7 in the bacterium) and 60 pg/ml 
of kanamycin (necessary in order to maintain pGPl-2 in 
the bacterium) , and then cultured until saturation, 
either at 30°C for K38 or at 37°C for BL21(DE3). The 
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cultures were then diluted to 1/10 in the same culture 
medium and cultured to an optical density (OD) of 1, 
measured at 600 nm on a Philips PU8740 
spectrophotometer (brand name) . 

The expression was then induced either thermally (K38) 
at 42°C for 15 min, or chemically (BL21(DE3)) by adding 
1 mM IPTG. It was continued for 3-5 hours at 37°C. The 
OD 6 oonm of the cultures was measured at various times. 

At the end of the expression, a volume of culture 
containing the equivalent of 0.1 OD of bacteria was 
removed. The bacteria were harvested by centr if ugat ion 
and suspended in 50 pi of lysis solution (LS: 50 mM 
Tris-Cl, pH 8.0, 2.5 mM EDTA, 2% SDS, 4 M urea, 0.7 M 
p-mercaptoethanol ) . After a few minutes at ambient 
temperature, 10 pi were loaded onto a 16.5% 
polyacrylamide gel for "Tricine" type electrophoresis 
[23], which makes it possible to obtain good separation 
of low molar mass proteins. 

In Figure 4 : 

Panels A, C and E: The bacteria were transformed with 
the plasmids pT7-7. pT7-7-TMEl, pT7-7-TME2 (panel A), 
pGEXKT, pGEXKT-TME 1 , pGEXKT-TME2 (panel C), and pGEXKT- 
dp-TMEl and pGEXKT-dp-TME2 (panel E) , and then cultured 
and induced as described above. The bacterial growth 
was followed by measuring the increase in turbidity of 
each culture by measuring the optical density at 600 nm 
as a function of the time in hours. 

Panels B, D, F: The bacteria were sampled at the time 
indicated in the text and treated as described above. 
They were then deposited onto an electrophoresis gel, 
either 16.5% acrylamide of the "Tricine" type (panel 
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B) , or 14% acrylamicie of the Laemmli SDS-PAGE type 
(panels D and F) . The electrophoresis shown in panel F 
migrated for a longer period of time than that shown in 
panel D, in order to improve the separation of the 
bands in the 30 000 Da region. After migration, the 
gels were stained for 10 minutes with Coomassie blue in 
a solution of 40% methanol, 10% acetic acid and 0.1% 
Coomassie blue R250, and then destained in a solution 
of 10% methanol, 10% acetic acid and 1% glycerol. 

Whatever the system tested, the first observation is 
that the frequency of transformation of the bacteria 
was low. For the bacteria that could be selected, the 
result of the expression tests was systematically 
negative. An example is given in Figure 4, panels A and 
B, with the series BL21 (DE3 ) pLysS {[pT7-7], [pT7-7- 
TME1] or [pT7-7-TME2 ] } . As illustrated by comparing the 
growth curves of panel A of Figure 4, the inventors 
noted, with the clones transformed with pT7-7-TMEl or 
pT7-7-TME2 and resistant on solid medium, that the 
induction stops the bacterial growth virtually 
immediately, unlike the clones containing the plasmid 
alone. Similarly, as can be seen in Figure 4(B), no 
band of proteins migrating in the region corresponding 
to the molecular mass of the expression products 
(~ 3-4000 Da) or of oligomers thereof ({1, 2, 3, 
etc.}) x molecular mass) can in fact be observed. 

The most probable explanation for this situation is 
that the expression of the membrane domains is very 
toxic for the bacterium. The difficulty in obtaining 
transf ormant s implies that a base line expression, even 
very low, is sufficient to kill them. It also shows 
that the pLysS system is not perfect for preventing 
this base line expression. Among the bacteria that 
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withstand the transformation step, the induction of 
expression of the hydrophobic domains becomes 
immediately lethal. The systems used effectively make 
it possible to protect the host bacterium against a 
base line expression, but as soon as this expression is 
induced, the toxicity is immediate and the bacteria are 
killed. 

Example 3: Expression of sequences encoding the GST— 
TME1 and GST-TME2 fusion proteins 

The expression vectors were constructed as described in 
Example 1, and then introduced into the BL21 (DE3 ) pLysS 
bacteria. The BL21 (DE3 ) pLysS bacteria were used in the 
interests of comparison with the preceding experiments 
since the expression of GST or of its chimeras does not 
require the DE3-pLysS system. 

The expression was induced with IPTG as for that of the 
domains alone. The characteristics of the proteins 
produced are summarized in Table 2 below. 



Table 2 



Plasmid 


Chimera, 
abbreviation 


Construct 


Size, 
aa 


Mass 
Da 


pGEXKT 


GST, G 


1M-D239 


239 


27469 


pGEXKT-T 1 


GST-TME 1 , 
GT1 


!M-S233-347M-A383 


269 


30506 


pGEXKT-T2 


GST-TME2, 
GT2 


lM_S23 3-717E-A 74 6 


263 


30191 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
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is indicated in italics. 

Panels C and D of the attached Figure 4 show the 
results obtained. The growth curves for the bacteria 
transformed with the various plasmids show that 
expression of the GT1 and GT2 chimeras is toxic. As can 
be seen on the electrophoresis gel of the Laemmli SDS 
14% PAGE type [24], the expression of TME1 fused to GST 
is accompanied by the absence of a band migrating at 
the expected size of 30 kDa. This implies that a very 
low level of expression of the chimera is sufficient to 
kill the bacteria. On the other hand, the GST-TME2 
chimera is this time visible on the electrophoresis 
gel, in the region of expected molecular mass of 
30 kDa. The level of expression remains limited 
however . 

The protein produced is not soluble despite the 
presence of GST in the fusion. In fact, as shown in the 
attached Figure 6, the solubilization, folding and 
purification trials for the GST-TME2 chimera were a 
failure . 

To obtain the results represented in this Figure 6, the 
GST and GST-TME2 proteins were expressed as described 
in Figure 4, using 150 ml of culture medium. The 
bacteria were then harvested by centr if ugat ion and 
suspended (20 mM KP0 4 , pH 7.7, 0.1 M NaCl, 1 mM EDTA, 
1 mM NaN 3 ) so as to have 100 OD/ml. Two ml of each 
culture were removed for sonication with 30 sec pulses 
at an amplitude of 15%. After sonication, a sample is 
taken for electrophoresis. It corresponds to the well 
"To" in Figure 6 (corresponding to the "total") . 

A first low-speed centr if ugat ion (5000xg, 15 minutes) 
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makes it possible to separate the non-ruptured bacteria 
and the inclusion bodies from the soluble or membrane 
proteins. The latter are found in the supernatant and a 
sample is taken. It corresponds to the well "Surn" in 
Figure 6 . 

The fraction containing GST alone is then treated with 
an affinity resin that makes it possible to bind and 
then elute specifically this protein (well "Af" of the 
GST gel in Figure 6) . 

The fraction containing the non-soluble GST-TME2 
protein is treated either with a mild detergent such as 
triton X100 (TX100), in the presence or absence of 
NaCl, or with a more solubilizing but more 
destructur ing detergent such as sarkosyl, before again 
being diluted in TX100 and passed over affinity resin. 

The results in Figure 6 show that GST is present in the 
soluble fraction, unlike the GST-TME2 fusion, which 
indicates that the latter is insoluble. The supernatant 
containing the GST is passed over an agarose-GSH resin 
capable of binding GST. This GST is then eluted with an 
excess of GSH (well marked "Af" of the GST gel in 
Figure 6 ) . 

The pellet containing the GST-TME2 fusion is not 
solubilized in the presence of a mild detergent such as 
TX100 (with or without added NaCl, well "TX100 +/- 
NaCl" of the GST-TME2 gel), but it can be solubilized 
with a more aggressive detergent such as sarkosyl. 
However, after dilution of the protein thus solubilized 
in TX100, a mild detergent which should favour its 
folding, the protein is not retained on the affinity 
resin, unlike GST, which suggests that the fusion 
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protein cannot be folded. 

These tests clearly indicate that the GST-TME2 protein 
is produced in the form of inclusion bodies that cannot 
be correctly folded. 

Example 4 : Expression of expression vectors encoding 
the fusion proteins including an Asp-Pro site and a GST 
site 

The construction of the vectors was carried out as 
described above and for the two vectors encoding the 
GST-TME 1 and GST-TME2 chimeric proteins, so as to 
produce the vectors encoding the GST-Asp-Pro-TMEl and 
GST-Asp-Pro-TME2 chimeric proteins. They are summarized 
in Table 3 below. 



Table 3 



Plasmid 


Chimera, 
abbreviation 
Fig. 4 


Construct 


Size, 
aa 


Mass, 
Da 


pGEXKT- 
dp-Tl 


GST-DP- TMEl ; 
G DP T1 


iM-D 2 33-dp- 
347M-A383 


271 


30718 


pGEXKT- 
dp-T2 


GST-DP- TME2; 
Gdp T2 


iM-S 2 33-dp- 

71 7~E-&746 


265 


30403 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
is indicated in italics. 

The vectors were tested as described in the preceding 
paragraph. The results obtained are shown on panels E 
and F of the attached Figure 4. 
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The growth curves for the bacteria transformed with the 
various plasmids show that the expression of the G dp Tl 
and G dp T2 chimeras is clearly less toxic than in the 
previous cases. Panel F shows that, this time, TME1 is 
produced due to the presence of the DP cleavage site. 
Its level of expression, as can be seen in panel F, is 
relatively moderate, but significant. GST-DP-TME2 is 
clearly overproduced. The two proteins migrate in their 
expected molecular mass region. 

The effect of the addition of the DP dipeptide is as 
significant as it is unexpected: it amplifies the 
expression of the domains and suppresses their 
toxicity. This effect of attenuation of the toxicity is 
not known for the DP dipeptide, the only property of 
which that has been reported to date is its ability to 
be cleaved by formic acid. Since the effect is observed 
on two different peptides that are both initially toxic 
for the bacterium, it is therefore reasonable to think 
that this property may extend to other hydrophobic and 
toxic peptides. 

The inventors verified that the site can be effectively 
cleaved by formic acid: the cleavage is slow and 
requires approximately 7 days at ambient temperature. 

The assays of expression at low temperature (20°C) 
overnight of these chimeras made it possible to 
demonstrate that they are produced in native form. In 
fact, it is possible to detect GST transferase activity 
in the membrane fraction of the bacteria. In addition, 
this activity is measured in solution when the 
membranes are solubilized in the presence of a non- 
ionic detergent such as (3-D-dodecylmaltoside , after 
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centr if ugat ion . 

Example 5: Expression of expression vectors encoding 
the fusion proteins including an Asp-Pro site and a 
site encoding thioredoxin (TrX) 

The pET32a-TrX, pET3 2 a-TrX-dp-TME 1 and pET3 2 a-TrX-dp- 
TME2 expression vectors were constructed as described 
above and were then introduced into BL21 (DE3 ) pLysS 
bacteria. The BL21 (DE3 ) pLysS bacteria were used in the 
interests of comparison with the previous experiments 
since the expression of GST or of its chimeras does not 
require the DE3-pLysS system. The positive clones were 
cultured and induced as described above. 

The induction of expression was carried out with IPTG, 
as for that of the domains alone. The characteristics 
of the proteins produced are summarized in Table 4 
below . 



Table 4* 



Plasmid 


Chimera, 


Construct 


Size, 


Mass, 




abbreviation 




aa 


Da 




Fig. 4 








pET32a 


Thioredoxin ; 
TrX 


lM-Ci 89 


189 


20397 


pET32a- 


TrX-DP-TMEl ; 


iM-Sns-PK- 


171 


17796 


Gend-dp-Tl 


TdpTI 


Gend-dp-Ti 






pET32a- 


TrX-DP-TME2; 


1M-S115-PK- 


165 


17481 


Gend-dp-T2 


T DP T2 


Gend-dp-T 2 







* : Tl = TME1 and T2 = TME2 



The amino acids (aa) are indicated with the one-letter 
code. The numbering of the sequences is done with 
respect to the proteins of origin, GST and viral 
polyprotein. That which refers to the membrane domains 
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is indicated in italics. "Gend" refers to the C-terminal 
sequence of the GST originating from the constructs with 
the pGEXKT plasmid. It corresponds to the primary 
peptide sequence SDLSGGGGGLVPRGS . The thioredoxin- 
SDLSGGGGGLVPRGS-DP- (TME1 or TME2) chimeras are shorter 
than the protein encoded in the vector of origin since 
the insertion is effected immediately after the 
thioredoxin . 

In Figure 5 : 

A: the bacterial growth was followed by measuring the 
increase in turbidity of each culture by optical density 
at 600 nm as a function of time. 

B: the bacteria were sampled as indicated for Figure 4. 
They were then loaded onto a Laemmli SDS-PAGE type 14% 
acrylamide electrophoresis gel and treated as indicated 
for Figure 4. 

As expected, and as shown by the growth curves 
represented in the attached Figure 5A for the bacteria 
transformed with the various plasmids, expression of the 
TrX-DP-TMEl and TrX-DP-TME2 chimeras according to the 
present invention is not toxic. The Laemmli 14% SDS-PAGE 
[24] electrophoresis gel represented in the attached 
Figure 5B shows that each chimera is overproduced. 

The present invention therefore makes it possible to 
produce, by genetic recombination, hydrophobic peptides 
corresponding to the membrane domains of the El and E2 
proteins of the hepatitis C virus envelope, the 
expression of which was acknowledged to be lethal in the 
techniques of the prior art. In addition, since the 
effect is observed on two peptides that are really 
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different and both initially toxic for the bacterium, 
this indicates that the present invention concerns other 
hydrophobic and toxic peptides. 

Example 6: Effect of the DP dipeptide on the toxicity of 
the TMEl and TME2 transmembrane domains expressed 
without fusion protein in the bacterium 

This example makes it possible to evaluate the antitoxic 
effect of the DP dipeptide inserted in the absence of 
GST or TrX fusion protein in accordance with the 
attached Claim 1. 

A) Materials: The pT7-7-pt TME i and pT7-7-pt 

tme2 plasmids 

are those which are described in Example 1. The pT7-7- 
dp-pt T MEi and pT7-7-dp-pt T ME2 plasmids were constructed and 
cloned in pT7-7 — ID No . SEQ ID NO: 33) as described 
in Example 1, but using the Nde I (5 f ) EcoR I (3') sites 
of the plasmid. The upstream (5') oligonucleotides 
integrate the dp sequence (gacccg) after the 1st 
methionine (atg) . The matrices used to generate each DNA 
were the pT7-7-pt T MEi and pT7-7-pt T ME2 plasmids. The 
sequences were verified after cloning. 

The oligonucleotides are as follows: 



i) Cloning of the sequence encoding (M) DP -TMEl in pT7-7: 

0L19 (+) : 5 ' -CG CATATG GACCCGATCGCTGGTGCT - 3' (Nde I 
underlined) = (-SBQ — ID No . SEQ ID NO: 15 of the attached 
sequence listing) ; 
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OL20 (-) : 5 ' - GAATTCC TAAGCGTCAACACCAGC-3 ' (EcoR I 
underlined) = (-SBQ — ID No. SEQ ID NO: 16 of the attached 
sequence listing) . 

ii) Cloning of the sequence encoding (M) DP-TME2 in pT7- 
7 : 

OL2 8 (+) : 5 ' -CG CATATG GACCCGGAATACGTTGTTC-3 ' (Nde I 
underlined) = — ID No. SEQ ID NO: 26 of the attached 
sequence listing) ; 

OL2 9 (-) : 5 ' -CA GAATTCC TAAGCTTCAGCCTGAGAG-3 ' (EcoR I 
underlined) = — ID No. SEQ ID NO: 2 7 of the attached 
sequence listing) . 

The pT7-7-dp-pt T MEi and pT 7- l-dp-pt TM E2 expression vectors 
obtained are given in the attached sequence listing ( SEQ 
ID No. SEQ ID NO: 4 4 and SEQ ID No. SEQ ID NO: 45) . 

B) Legend of the attached figures 7A and B: the 
bacterial strain BL21 (DE3 ) pLysS was transformed either 
with the plasmid alone or with the various versions of 
pT7-7 integrating the 4 constructs expressing TME1, 
M-DP-TME1 (Figure 7A) , or TME2 , M-DP-TME2 (Figure 7B). M 
represents methionine; it is present at the N-terminal 
position of the peptides when the toxic proteins are 
produced according to the present invention with the 
pT7-7 plasmid. 

The growth of the various clones was compared after 
induction with IPTG, according to the protocol identical 
to the chemical induction described in Example 2, and 
averaged over the OD values of 4 different clones for 
each construct. 
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C) Results: 

Figures 7A and 7B show that the bacteria that have a 
plasmid expressing TME1 and TME2 proteins grow less 
rapidly after induction than the control strain which is 
transformed with the pT7-7 vector alone. 

These results show that the strains transformed with the 
plasmids expressing the M-DP-TME 1 ( SEQ ID No . SEQ ID NO: 
50) and M-DP-TME2 (€r£Q — — Ne- rSEQ ID NO: 51) versions 
according to the invention grow significantly better 
than those that express the TMs without DP. This is true 
for TME 1 , and even more clearly so for TME2 . 

The conclusion is that the N-terminal insertion of DP in 
accordance with the present invention contributes, 
surprisingly, to a significant decrease in toxicity of 
the expression of the membrane domains, in particular in 
the absence of a soluble fusion protein such as GST or 
thioredoxin . 
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